Generic Inverted Index on the GPU
نویسندگان
چکیده
Data variety, as one of the three Vs of the Big Data, is man-ifested by a growing number of complex data types such asdocuments, sequences, trees, graphs and high dimensionalvectors. To perform similarity search on these data, exist-ing works mainly choose to create customized indexes fordifferent data types. Due to the diversity of customized in-dexes, it is hard to devise a general parallelization strategyto speed up the search. In this paper, we propose a genericinverted index on the GPU (called GENIE), which can sup-port similarity search of multiple queries on various datatypes. GENIE can effectively support the approximate near-est neighbor search in different similarity measures throughexerting Locality Sensitive Hashing schemes, as well as sim-ilarity search on original data such as short document dataand relational data. Extensive experiments on different real-life datasets demonstrate the efficiency and effectiveness ofour system.
منابع مشابه
A Generic Inverted Index Framework for Similarity Search on the GPU - Technical Report
Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general paralleliz...
متن کاملEfficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units
Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines – lists intersection and inde...
متن کاملA GPU-based parallel algorithm for time series pattern mining
Mining of time series pattern is an important research area, of which getting LCSS(Longest Common Subsequence) between high-dimensional time series is one of the most important issues. Large scale data needs to be handled in practical applications, so the research of efficient retrieval method is becoming a realistic work. Based on the issues above, we propose an efficient parallel algorithm to...
متن کاملComparison between BMI and Inverted BMI in Evaluating Metabolic Risk and Body Composition in Iranian Children
Objectives: To compare BMI and inverted BMI in evaluating body measurement, resting blood pressure, Dual energy X-ray absorptiometry (DEXA) parameters of fat mass and metabolic risk factors in Iranian children Materials and Methods: This is a cross-sectional study on 477 children aged 9-18 yearsin the South of Iran. Weight, height, resting blood pressure, waist and hip circumference and puberta...
متن کاملFast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal
Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1603.08390 شماره
صفحات -
تاریخ انتشار 2015